Goto

Collaborating Authors

 double-blind review


MoQE: Improve Quantization Model performance via Mixture of Quantization Experts

arXiv.org Artificial Intelligence

Quantization method plays a crucial role in improving model efficiency and reducing deployment costs, enabling the widespread application of deep learning models on resource-constrained devices. However, the quantization process inevitably introduces accuracy degradation. In this paper, we propose Mixture of Quantization Experts( abbr. MoQE), a quantization inference framework based on the Mixture-of-Experts (MoE) architecture, aiming to jointly improve the performance of quantization models. MoQE combines multiple quantization variants of one full-precision model as specialized "quantization experts" and dynamically routes input data to the most suitable expert based on its characteristics. MoQE alleviates the performance degradation commonly seen in single quantization models through specialization quantization expert models. We design lightweight, structure-aware router models tailored for both CV and NLP tasks. Experimental evaluations on ResNet, LLaMA, and Qwen model families across benchmark datasets including ImageNet, WikiText, C4, and OpenWebText demonstrate that MoQE achieves performance comparable to SOT A quantization model, without incurring significant increases in inference latency. Quantization method plays a pivotal role in the field of machine learning, particularly in enhancing model efficiency and reducing resource consumption. As deep learning models grow increasingly complex, their demand for computational resources escalates, constraining deployment on resource-limited devices and increasing operational costs. Furthermore, quantization method streamlines the model optimization pipeline, enabling developers to achieve efficient deployment within shorter timeframes and accelerating time-to-market for AI-driven products. Consequently, quantization method serves not only as a critical enabler for improving the accessibility and practicality of machine learning models but also as a key facilitator in the broader dissemination of artificial intelligence technologies. However, their practical deployment faces several critical challenges.


On The Vulnerability of Recurrent Neural Networks to Membership Inference Attacks

arXiv.org Artificial Intelligence

We study the privacy implications of deploying recurrent neural networks in machine learning. We consider membership inference attacks (MIAs) in which an attacker aims to infer whether a given data record has been used in the training of a learning agent. Using existing MIAs that target feed-forward neural networks, we empirically demonstrate that the attack accuracy wanes for data records used earlier in the training history. Alternatively, recurrent networks are specifically designed to better remember their past experience; hence, they are likely to be more vulnerable to MIAs than their feed-forward counterparts. We develop a pair of MIA layouts for two primary applications of recurrent networks, namely, deep reinforcement learning and sequence-to-sequence tasks. We use the first attack to provide empirical evidence that recurrent networks are indeed more vulnerable to MIAs than feed-forward networks with the same performance level. We use the second attack to showcase the differences between the effects of overtraining recurrent and feed-forward networks on the accuracy of their respective MIAs. Finally, we deploy a differential privacy mechanism to resolve the privacy vulnerability that the MIAs exploit. For both attack layouts, the privacy mechanism degrades the attack accuracy from above 80% to 50%, which is equal to guessing the data membership uniformly at random, while trading off less than 10% utility.


Yann LeCun Paper Rejected - Power Of Double-Blind Review

#artificialintelligence

Yann Andre LeCun, a French computer scientist who focuses on machine learning, computer vision, mobile robotics, and computational neuroscience, recently tweeted that one of his articles has been rejected from NeurIPS 2021. Yann LeCun is a Silver Professor at New York University's Courant Institute of Mathematical Sciences and Vice President, Chief AI Scientist at Facebook. He is well-known for his work on optical character recognition and computer vision using convolutional neural networks (CNNs) and is often regarded as the inventor of convolutional nets. He is also a co-creator of the DjVu image compression technology. The author is a multifaceted individual with academic and industrial experience in artificial intelligence, machine learning, deep learning, computer vision, intelligent data analysis, data mining, data compression, digital library systems, and robotics.


Transfer Learning of Graph Neural Networks with Ego-graph Information Maximization

arXiv.org Machine Learning

Graph neural networks (GNNs) have been shown with superior performance in various applications, but training dedicated GNNs can be costly for large-scale graphs. Some recent work started to study the pre-training of GNNs. However, none of them provide theoretical insights into the design of their frameworks, or clear requirements and guarantees towards the transferability of GNNs. In this work, we establish a theoretically grounded and practically useful framework for the transfer learning of GNNs. Firstly, we propose a novel view towards the essential graph information and advocate the capturing of it as the goal of transferable GNN training, which motivates the design of Ours, a novel GNN framework based on ego-graph information maximization to analytically achieve this goal. Secondly, we specify the requirement of structure-respecting node features as the GNN input, and derive a rigorous bound of GNN transferability based on the difference between the local graph Laplacians of the source and target graphs. Finally, we conduct controlled synthetic experiments to directly justify our theoretical conclusions. Extensive experiments on real-world networks towards role identification show consistent results in the rigorously analyzed setting of direct-transfering, while those towards large-scale relation prediction show promising results in the more generalized and practical setting of transfering with fine-tuning.


Unified recurrent neural network for many feature types

arXiv.org Machine Learning

There are time series that are amenable to recurrent neural network (RNN) solutions when treated as sequences, but some series, e.g. In order to address such situations, we introduce a unified RNN that handles five different feature types, each in a different manner. Our RNN framework separates sequential features into two groups dependent on their frequency, which we call sparse and dense features, and which affect cell updates differently. Further, we also incorporate time features at the sequential level that relate to the time between specified events in the sequence and are used to modify the cell's memory state. We also include two types of static (whole sequence level) features, one related to time and one not, which are combined with the encoder output. The experiments show that the modeling framework proposed does increase performance compared to standard cells. The study of time series has a long history and the literature for it covers many different methods (Hamilton (1994)). The study of asynchronous time series is an important subset of this. Asynchronous time series are series for which features are sampled at irregular time intervals, and at any given time step new values of any subset of features may be present. When a feature does not change values often it can be treated as being present only at times of change.